Modeling road accident fatalities with underdispersion and zero-inflated counts

In 2013, Thailand was ranked second in the world in road accident fatalities (RAFs), with 36.2 per 100,000 people. During the Songkran festival, which takes place during the traditional Thai New Year in April, the number of road traffic accidents (RTAs) and RAFs are markedly higher than on regular days, but few studies have investigated this issue as an effect of festivity. This study investigated the factors that contribute to RAFs using various count regression models. Data on 20,229 accidents in 2015 were collected from the Department of Disaster Prevention and Mitigation in Thailand. The Poisson and Conway–Maxwell–Poisson (CMP) distributions, and their zero-Inflated (ZI) versions were applied to fit the data. The results showed that RAFs in Thailand follow a count distribution with underdispersion and excessive zeros, which is rare. The ZICMP model marginally outperformed the CMP model, suggesting that having many zeros does not necessarily mean that the ZI model is required. The model choice depends on the question of interest, and a separate set of predictors highlights the distinct aspects of the data. Using ZICMP, road, weather, and environmental factors affected the differences in RAFs among all accidents, whereas month distinguished actual non-fatal accidents and crashes with or without deaths. As expected, actual non-fatal accidents were 2.37 times higher in April than in January. Using CMP, these variables were significant predictors of zeros and frequent deaths in each accident. The RAF average was surprisingly higher in other months than in January, except for April, which was unexpectedly lower. Thai authorities have invested considerable effort and resources to improve road safety during festival weeks to no avail. However, our study results indicate that people’s risk perceptions and public awareness of RAFs are misleading. Therefore, nationwide road safety should instead be advocated by the authorities to raise society’s awareness of everyday personal safety and the safety of others.


Introduction
In 2013, Thailand ranked second, after Libya, in road accident fatalities (RAFs) in a survey of 180 countries worldwide, with an estimated 36 overdispersion problem [20][21][22][23] and other versions such as NB-1, NB-2, and Bayesian NB have been developed [24][25][26][27]. Alternatively, CMP has recently been adopted to overcome overdispersion [28,29] and underdispersion [30], although only a few published studies have addressed the latter. Both dispersion and excessive zero counts are concerns in count data analyses. Searching in the Elsevier Science Direct database for the period 2018-2022 using the keywords: "excessive zeros and road fatality, road accident or road injury," 63 studies were identified that dealt with many zero counts and overdispersion problems using zero-inflated models (see for example [31][32][33][34][35]). To the best of our knowledge, no study has analyzed accident data with underdispersion and excessive zeros. Moreover, classifying different injury severity levels is also of interest in road accident studies.
Logistic regression models have been seen in the literature [36][37][38], along with a range of machine learning techniques, such as regression trees, random trees, random forests, support vector machines, k-nearest neighbors, and deep neural networks [39][40][41][42][43][44]. In addition to count data and categorical data, there is considerable interest in the traffic congestion rate. To account for unobserved heterogeneity across observations, the models under Bayesian frameworks have been proposed in studies of pedestrian safety and signalized intersections [45,46]. To deal with sophisticated models, Bayesian methods have been employed to estimate model Table 1. Previous studies on road safety using count regression and zero-inflated models. For model comparison, if any, the model that performed best is presented in boldface.

Study Outcome variable Model Equidispersion
Michener and Tighe [18] Crash frequency Poisson Hezaveh et al. [19] Crash frequency Poisson and Geographically Weighted Poisson

Overdispersion
Chen et al. [20], Quistberg et al. [23] Number of fatalities NB Obinguar and Iryo-Asano [21] Pedestrian crash frequency NB Das [22] Crash frequency NB Khattak et al. [24] Crash frequency NB-1, NB-2, NB-P and generalized Poisson Kamla et al. [25] Counts of harsh braking incidents NB Random-parameters model and NB Fixed-parameters model Safaei et al. [26] Number of fatal accidents random effect NB parameters, such as studies of correlations and heterogeneity in crash rates [47] and for cyclist safety [48]. However, the predictive models for death and/or injury are limited to Thailand's road safety. The present study fills in this gap in research by using count regression models to investigate the distribution of RAFs in Thailand and their potential association with the road and environment at the accident location.
The remainder of this paper is organized as follows. In the Methods section, a detailed description of the data source is given, and count regression models and test statistics for dispersion, excessive zeros, and goodness of model fitting are discussed. Model selection with the best performance in identifying key factors is provided in the Results section. The Discussion section interprets the results and compares them with previously reported results. The Conclusion section summarizes the overall study findings in terms of the differences between the death counts and the extra zero death counts, as well as the limitations of the study.

Data
Data were collected from the DDPM, Ministry of Interior, for 2015, which was the most recent year for which data are available at the time of this study. A total of 76 disaster prevention and mitigation provincial offices are located across Thailand to handle disaster management at the local level. Each office handles data collection and analysis and reports on disaster damage and loss due to natural and man-made disasters, including RTAs.
In 2015, DDPM reported 25,586 RTAs from emergency aid requests for crash accident victims at the provincial and district levels. Complete data were available for 20,229 cases. The unit of analysis here is the individual RTA in a year of study; the dependent variable is the number of human deaths, and the independent variables are the road characteristics (class, surface, and section), weather conditions, environmental conditions, and month of the year. Covariate variables and their specifics are detailed in Table 2. The national database on road safety supplies public information on the number of RTAs, RAFs, types of vehicles, human demographics, and behaviors (which are beyond the scope of this study).
The road networks in Thailand are classified according to four governing authorities. The national highways are under the control of the Department of Highways. Four levels of national highways are differentiated by the number of digits in their name. The first-level networks connect Bangkok to four regions in Thailand (North, South, Northeast, and Central). The second-level networks connect the first-level networks to several significant provinces over long distances and spread out. The third-level networks connect the first and second-level networks to key locations in many provinces, including the long-distance road parallel to the border and attaching the higher-level networks. Finally, highways at the fourth level link other networks to provinces, districts, or important strategic locations in the provinces.
All first-level highways are either 4-lane, 6-lane, or 8-lane divided roads supporting high traffic density. Most second and third-level roads are mixed between 4-lane divided and undivided roads. The four-level highways are 2-lane undivided roads. Rural highways are constructed and maintained by the Department of Rural Roads. This class of roads links all levels of national highways to districts, sub-districts, and villages. Likewise, it connects places within districts and sub-districts. Urban roads comprise the networks within the major city areas under the authority of the local municipality. These networks typically comprise high-density roads with many junctions, traffic signals, and crossings. Local streets are narrow concretesurface roads, with some being gravel in villages, built and maintained by the local subdistrict administration organization.

Models
In this study, we applied standard Poisson, CMP, zero-inflated Poisson (ZIP), and zeroinflated CMP (ZICMP) regressions to model the number of fatalities in accidents for the year 2015 in relation to road characteristics, weather conditions, environmental conditions, and month of the year.

Poisson regression
The Poisson distribution is appropriate for the dependent variable Y, taking nonnegative integer values 0, 1, 2, . . .. This can be used to model the number of occurrences of an event. In this study, Y represents the number of deaths per accident. The probability mass function (p.m.f.), mean, and variance of the Poisson distribution are listed in Table 3. Poisson regression is limited by the assumption imposed on its variance. The expected death counts λ i at the accident i (i = 1, 2, . . ., n) are calculated using l i ¼ expðX 0 i βÞ to ensure λ i > 0. X 0 i is the vector of covariates and β is the vector of estimable coefficients. Given the p.m.f., link function, and the assumption of independent observations, the log-likelihood function for observation i is given as

Road surface
1: Dry Road skin is in a dry condition that exposes its original surface material.

2: Wet
Road skin is in a wet condition due to the rainstorm or the remaining water after the rainfall. Substantial water on the road causes slippery surfaces and hydroplaning.

1: Straight
Road section is in a straight line, not horizontal, curved, or crooked, continuing in the same direction without deviating.

2: Curve
Road section is in both horizontal and vertical curves or bends directions.

3: Crossing and others
Other environmental properties of the road section include junctions, pedestrian crossings, and roadwork obstacles.

Weather condition
1: Clear A condition in which a road accident occurs in sunny weather, not cloudy, rainy, foggy, smoky, or dusty conditions that reduce daylight or visibility.

2: Fog
A condition in which a road accident occurs in foggy weather, which appreciably reduces visibility. It may include smoke from roadside fire and dust from roadwork and nearby construction.

3: Rain
A condition in which a road accident occurs in rainy weather, which appreciably reduces visibility and causes wet road surfaces.

Light condition
1: Day A condition in which a road accident occurs in daylight.

2: Night with light
A condition in which a road accident occurs in the dusk or darkness with artificial light (from streetlights).

3: Night without light
A condition in which a road accident occurs in the dusk or darkness without artificial light (from streetlights). Summing over n observations, the log-likelihood function is

Conway-Maxwell-Poisson regression
The CMP distribution is a generalization of the Poisson distribution that serves to model both underdispersed and overdispersed data. This distribution was originally proposed by Conway et al. [49] and subsequently implemented by Shimueli et al. [50] and Sellers et al. [51]. The p. m.f., mean, and variance are summarized in Table 3. CMP regression overcomes the restrictive assumption of Poisson regression by defining an additional dispersion parameter, n i ¼ expðS 0 i δÞ. S 0 i is the vector of covariates and δ is the vector of estimable coefficients. The likelihood function for observation i has the form The log-likelihood function can be written as

Zero-inflated Poisson regression
The ZIP distribution was used to model count data with an excess of zero counts. It is assumed that with probability π, the only possible observation is zero, and with probability 1 − π, a Poisson (λ) random variable is observed. In our case, the former represents the proportion of actual non-fatal accidents. It is defined by always-zero death counts. The latter represents the

PLOS ONE
proportion of crashes with or without fatalities, that are natural count data including zeros. A crash with at least one death is a fatal accident. A crash without fatalities is a common nonfatal accident. It is defined by not-always-zero death counts. Our zeros are a mixture of actual and common non-fatal accidents. The ZIP distribution has the following form: The resulting p.m.f., mean, and variance are summarized in Table 3.
The ZIP model can be generalized by a regression model that allows both the Poisson parameter λ and the weight parameter π to vary. The model has two parts: the Poisson count model predicting zeros and non-zero counts (count component of the model), and the logit model predicting excess zeros, which is the so-called always zeros (zero component of the model). W i 0 is the vector of covariates and γ is the vector of estimable coefficients. The logit link is defined as follows: If π is not a function of λ, then the log-likelihood function for the ZIP regression is

Zero-inflated Conway-Maxwell-Poisson regression
A suggested model for underdispersed or overdispersed count data with excessive zeros is the ZICMP distribution proposed by Sellers et al. [52]. This generalization of the ZIP distribution is a mixture of a degenerate distribution at zero with probability π and a CMP(λ, ν) distribution with probability 1 − π. The resulting p.m.f., mean, and variance are summarized in Table 3. The ZICMP regression allows both the CMP parameter (λ, ν) and weight parameter π. It has two parts: the CMP count model and the logit model. The log-likelihood function for the ZICMP model is given by The maximum likelihood method was used to estimate the coefficients of all four regression models used in this study.

Dispersion test
The question arises as to whether there is any evidence of dispersion. First, in exploratory data analysis, when the mean of death counts is greater (less) than its variance, an underdispersion (overdispersion) is observed. In addition, the likelihood ratio test (LRT) can be performed to test the dispersion in the CMP regression. According to the relationship between the mean and variance of the CMP distribution such that VarðYÞ EðYÞ � 1 n . This yields overdispersion and underdispersion for ν < 1 and ν > 1, respectively. In particular, when ν = 1, the CMP model collapses into the Poisson model. Hence, the test of H 0 : ν = 1 versus H 1 : ν 6 ¼ 1 can be interpreted as testing whether the Poisson model is preferred over the CMP model. H 1 does not specify the direction of data dispersion. In this case, the LRT statistic is given by where log Lðb;n ¼ 1;ĝÞ and log Lðb;n;ĝÞ are the maximized log-likelihood under the ZIP and ZICMP regression models, respectively.

Zero-inflation test
The score test for zero-inflation in a Poisson distribution is to determine whether an extra proportion of zeros (always zeros part) is added to the common count part of the discrete Poisson distribution. Under the null hypothesis (H 0 : π = 0), the score statistic has an asymptotic chisquared distribution with one degree of freedom [53]. The score statistic is defined as follows: where n is the total number of observations; n 0 is the number of zeros in the data; � y is the mean of the death counts (the estimate of the Poisson parameter under the null hypothesis); and p 0 ¼ expðÀ � yÞ. Typically, CMP is a flexible two-parameter distribution for overdispersed and underdispersed count data. The question of addressing the excessive number of zero counts under CMP is of interest, leading to the ZICMP model. To test the CMP model against the ZICMP alternative, the LRT test is determined by À 2 log Lðb;n;ĝ ¼ 0Þ À log Lðb;n;ĝÞ h i � w 2 1 , where log Lðb;n;ĝ ¼ 0Þ and log Lðb;n;ĝÞ are log-likelihood values associated with CMP MLEs and ZICMP MLEs, respectively.

Goodness of fit and model comparison
To assess model adequacy, the generalized Pearson χ2 statistic is used, and it is computed as follows: The numerator is the squared difference between the observed death count and the expected value of the death count, and the denominator is the expected value of the death count. In large samples, the distribution of this statistic is approximately chi-squared with n − k degrees of freedom, where n is the total number of observations, and k is the number of estimated parameters, including the intercept. A small value of the statistic supports the decision not to reject the null hypothesis. If the null hypothesis cannot be rejected, then the model is adequate. The assessment criteria used for model comparisons are often based on several likelihood measures. Two of the most extensively used measures are the log-likelihood value and Akaike Information Criterion (AIC). A model that provides a larger log-likelihood value is considered a better model. The AIC is defined as AIC ¼ À 2 ln LðθÞ þ 2k, where LðθÞ denotes the likelihood function of a given fitted regression model andθ is a vector of the estimated parameters in the model. The smaller the AIC, the better the model.

Software used
The results of this study were obtained using R 4.1.0 [54] with four main packages for count models: MASS, pscl, COMPoissonReg, and vcdExtra. The standard Poisson model within a generalized linear model framework was implemented in the R package MASS using the glm() function [55]. The zero-inflated extension of the Poisson distribution is provided by the zeroinfl() function in the pscl package [56]. Currently, implementation of the count model for mean, dispersion, and zero-inflation is available in the COMPoissonReg package using the glm.cmp() function [57]. The score test statistic is widely used to test the null hypothesis that the data do not characterize the extra zeros for the Poisson distribution. The score test [53] was implemented as the zero.test() function in R under the package vcdExtra [58].

Descriptive statistics
The distribution of the death counts is summarized in Table 4 and presented in Fig 2. The number of deaths per accident in 2015 ranged from 0 to 6, with many accidents reporting 0 or 1 death per crash. The proportion of deaths was the highest for the zero count (73%) and the second highest for one count (25.79%). Fatal accidents accounted for approximately 27% of the total incidence (n = 20,229) and 6,109 deaths. 73% were non-fatal accidents, in which at least one person was injured, but no deaths occurred. The minimum number of fatal accidents was 332 (in October) and the maximum was 611 (in January). Non-fatal accidents were observed at least in September (737) and most in April (3,090).
The monthly RTA averages were 465.7, 1,220.1, and 1,685.8, for fatal, non-fatal, and total accidents, respectively. The number of RTAs was highest in April, followed by January and December. In these months, important festivals are celebrated in Thailand, with long holiday periods of more than five days. However, the fatalities per accident per month were lower in these festival months than in other months. The highest proportion (0.396) was observed in  November and the lowest (0.165) in April. A greater number of non-fatal injuries was observed during celebrations. This plays a crucial role in excessive zero participation in the data. Six explanatory variables were used to estimate RAFs (Table 4). Many accidents occurred on national (43%) and rural highways (14%). These highways are constructed and maintained by the central government. The remaining accidents occurred on urban roads (15%) and local streets (28%). They are built and maintained by local government administrations. The national highway in Thailand covers 52,189 km, accounting for 7.5% of the road network in the country, the rural highway covers 49,123 km (7.0%), and the local government administration roads and streets cover 597,667 km (85.5%). Interestingly, although national highway coverage is 11.4 times lower than the urban road and the local street network, there seems to be an equal chance of RTAs (43%) occurring on both networks. Unfortunately, data on vehicle types and traffic volumes for the various types of roads are not available. Consequently, we cannot report an average of vehicles per day on the roads and which vehicle type occurs the most.
The most common road geometry for which the accidents occurred was straight section roads (68%), followed by pedestrian crossings (20%), and then curves (12%). RTAs were two times as high on straight road sections compared to the crossings and curved sections combined. Dry surface roads were at elevated danger (93%), whereas roads with wet surfaces were associated with lower danger (7%). Most accidents occurred under clear weather conditions (93%). Driving on dry surface roads and under clear weather may increase the risk of fatal accidents because drivers are less cautious in these conditions.
In terms of light conditions, accidents were reported in descending order of daylight (61%), at night with light on (23%), and at night without light (16%). This suggests that the visibility of the driver is one of the main factors involved in the cause of RTAs. During festival months, New Year celebrations (December: 10.1% and January: 13.2%), and Songkran holidays (April: 18.3%) the frequency of accidents increased, resulting in a sizable number of fatalities and injuries. Table 5 presents the results of fitting the death count data using the four probability models. When fitting the data to the Poisson distribution, the average number of fatalities per road accident was 0.302 (exp(-1.1974)). The estimated dispersion parameter in the Poisson distribution is less than one, indicating count data with underdispersion. This was improved by fitting The CMP model simply decreases the variability in the Poisson distribution but does not necessarily accommodate excess zeros. The score statistic provided evidence that the observed zeros exceeded the zero bound of the Poisson distribution (177.69, df = 1, p < 0.0001). Interestingly, fitting the zero-inflated models to the data provided a different conclusion. The always-zero proportion parameter in the zero-inflated model was not statistically significant. Neither the ZIP model nor the ZICMP model captured the excess zeros in the data.

Fitting fatality count distributions
The ZI coefficients refer to separating actual non-fatal accidents from crashes with or without deaths. Exponentiating these coefficients in both ZIP (exp(-12.6700) � 0) and ZICMP (exp (-14.4908) � 0) returned an approximate zero value, suggesting that the proportion of actual non-fatal accidents was zero. Thus, observations of zero deaths seem as expected, not excess, in the theorical count distribution. Adding zero-inflation to the model did not improve the loglikelihood or reduce the AIC. Further exploration is required to fully explain the two processes of zeros in the RTA data. In the next section, covariates are added to these models to improve the fit.

Regression models with covariates
Count regression models were applied to investigate the relationship between the number of deaths and covariates, including road characteristics, weather conditions, light conditions, and month of the year. Before developing the model, the correlations among independent variables were considered using Cramer's V, which produced values ranging from 0.029 to 0.591. These results showed no significant correlation among covariates and supported the notion that there was no multicollinearity in the regression models.
The results of the coefficients and standard errors are listed in Table 6, and the 95% confidence intervals are displayed in Fig 3. Standard Poisson regression and CMP regression were used for the count part modeling with and without an additional dispersion parameter, respectively. Then, the zero-inflated models with ZIP and ZICMP regression were adopted to enhance the model fitting in zero-part modeling.

Count part in Poisson and CMP regression
According to the log-likelihood, AIC, and Pearson's chi-squared values, CMP regression outperformed ordinary count regression. All six covariates were statistically significant predictors of the fatality distribution. Driving on national highways or dry surfaces is associated with a higher fatality rate. On average, the number of deaths at crossings is likely to decrease compared with that on straight road sections. However, the opposite is true when approaching curves. The positive light condition coefficient implied that driving during daylight was much safer than driving at night, with or without light on the streets. RAFs were most common on sunny days. A higher risk was observed during rain, but the risk was lower in foggy weather. Compared to the occurrence of RAFs in January, the fatality risk decreased in April, but increased in other months. For each predictor, the 95% confidence intervals estimated from the CMP regression were wider than those from the Poisson regression. This is because of the adjustment for the underdispersion phenomenon.

Count part and zero part in ZIP and ZICMP regression
The ZICMP model provides a better fit to fatality frequency data than the ZIP model. Five risk factors are statistically significant for predicting the count part distribution, whereas only the month of the year is an important covariate for extracting excessive zeros from the data. The zero-inflated model coefficients separate actual non-fatal accidents from crashes with or without deaths. The actual number of non-fatal accidents was 2.37 (exp(0.8632)) times higher in April than in January. The estimated proportion of the always-zeros was 0.275 and 0.473 (refer to Eq (6)) for the reference months of January and April, respectively. More specifically, the extra zeros were associated with an excessive number of no deaths that can be explained by the month factor.
Using January as a baseline, the probability of zero deaths significantly increases in April, but decreases in August and October. The probability of zero deaths in other months did not differ from that in the first month of the year. These results confirm the exploratory data analysis results that the death rate is lower during festival periods, despite an unusual increase in the number of non-fatal accidents. This information was not explicit when fitting the data with Poisson or CMP regression. For the count part, the directions of associations between the response and the roadway class, surface, and section as well as lighting and weather conditions, were the same as those presented in the CMP regression result. However, the magnitudes were clearly different because the ZICMP model allows for variable adjustment in underdispersed data along with zero-inflation. The corresponding confidence intervals were estimated to reflect both phenomena.
When controlling for covariates, the expected number of deaths per accident increased from 0.333 under the Poisson regression to 0.393 under the ZICMP regression. In other words, after adjusting for underdispersion and zero-inflation, for every ten accidents, the number of human deaths approximately increased from three to four people. This finding is of vital importance for public health concerns and road safety issues in Thailand.

Model selection
The Poisson model is the most widely used model for analyzing count data. However, the Poisson model has the drawback of assuming a unit variance-to-mean ratio. In road safety data, this restriction may be violated because the observed count may present a variance that is larger or smaller than expected, resulting in overdispersion or underdispersion, respectively.

PLOS ONE
NB regression is often used to model RTAs, RAFs, or RAIs with overdispersion, such as, the number of injury accidents on road bridges in Norway during 2010-2016 [59]; RAFs and RAIs with spatial panel data analysis in Thailand during 2012-2016 [60]; the number of human deaths per crash in the Oromia region of Ethiopia [61]; and safety performance functions for urban intersections of Antwerp in Belgium [62]. CMP regression is an alternative model to NB regression and has the advantage of being able to handle both overdispersed and underdispersed counts. However, CMP is not popular because of the complexity in estimating the mean and dispersion parameters using iterative methods for nonlinear optimization and is limited by the availability of computational tools. A generalized linear model based on CMP distribution was developed by Guikema et al. [63] and was first applied to the analysis of RTA data with overdispersion [28]. Two studies of crash data were from signalized four-legged intersections in Toronto, Ontario in 1995, and from rural four-lane divided and undivided highways in Texas. The results of the goodness-of-fit test statistic and predictive performance were similar for the CMP and NB regressions.
Modeling RTAs, RAFs, or RAIs with underdispersed data has rarely been reported. A study in 2010 [30] showed that the CMP model for underdispersion provides a better fit to RTAs than the Poisson and gamma models in an analysis of 162 railway-highway crossings in South Korea between 1998 and 2002. Further, a recent report in 2020 introduced clustered longitudinal CMP with gamma random effects to model RTAs in Mauritius [29].
Excessive zero counts are highly skewed to the right and may cause a smaller conditional mean than the true mean value, resulting in overdispersion or underdispersion. The zeroinflated version models of Poisson, NB, and CMP can be applied to overcome such problems. For example, Poisson, NB, ZIP, and ZINB were used to model the number of RAFs on the roads with the highest number of accidents in Malaysia, F0050 [64], and in the Oromia region of Ethiopia [61]. Modeling the zero-inflated regression of Poisson and NB of the number of RTAs on roads such as the F001 Jalan Jb Air Hitam and FT050 Kluang-A/ Hitam-B/Pahat roads in Malaysia was reported in [65] and [66], respectively. The results suggest that the zero-inflated version models offer better statistical performance than traditional models.
Recently, the ZICMP distribution and regression, along with their computational "COM-PoissonReg" package in R, were developed [52]. ZICMP can account for excess zeros and data with overdispersion or underdispersion. The model performance was compared to that of ZIP and ZINB in a study of educational data with overdispersion. As expected, ZICMP and ZINB exhibited similar performances in capturing extra zeros, overdispersion, and surpassing ZIP. To the best of our knowledge, this has not yet been used for road safety analysis.
In this study, the Poisson, CMP, ZIP, and ZICMP regression models were fit to RAFs that occurred in Thailand in 2015. This is an interesting area because excess zeros and underdispersion are rarely observed in road safety data. The ZICMP model performed the best, marginally outperforming CMP with a minute, almost negligible, difference in log-likelihood values. This suggests that having many zeros does not necessarily mean a zero-inflated model is required or that important covariates are missing. Therefore, issues of concern that need to be addressed, and the possibility of finding associated factors dictate the model-selection process. These two distinct aspects are clear. CMP provides insight into death counts per crash if there are no extra zeros. When an accident occurs, the driver or passenger may be injured or die. ZICMP accounts for excessive zeros, suggesting an unusual increase in the number of zero death counts on roads. This can be viewed as unexpected, non-fatal accidents. The extraction of such information requires the identification of associated covariates.

Month, road, and environmental effects
The month of the year played a vital role in extracting different messages from the underdispersed fatality data. In the case of death counts, the count part in CMP addresses the lower number of fatalities in April and the larger number of deaths in other months compared to January. For the extra zeros, the zero part in ZICMP captures excess zero deaths in April, but figures are lower in August and October than in January. However, the month did not affect the death counts in the count part of the ZICMP model. This indicates that different sets of variables are required to interpret different aspects of the data. Our results serve as the basis of the model formulation. These two generalized linear mixed models allow for different sets of relationships between the average, dispersion, or zero structure and its associated covariates [52].
Two key findings were highlighted in April: a lower number of deaths and a greater number of excess zero deaths on roads. This is attributed to April being a festival month. For centuries, April has been celebrated annually as the traditional Thai New Year, known as the Songkran festival (the period of 11-17 April every year). It is rich in culture and tradition with water blessing ceremonies for prosperity and wellness as well as the world-renowned water parties for ritual cleansing and a fresh start. An unacceptable drawback is that it coincides with the time of the highest crash frequencies of the year, called the ''seven deadly days." Heavy road traffic occurs at the beginning and end of the long holidays when there is an excess of public transportation and numerous private vehicles on the roadways connecting the provinces. Because rail public transit and railroad networks in Thailand are not efficiently served, public buses must double or even triple the number of services to accommodate the most prominent human migration of the year. The traffic flow rate on the road is low, which causes traffic jams and accidents. Road accidents may cause more congestion, particularly on interconnected and national highways.
For decades, the government accorded a lot of time, budget, and effort into road safety campaigns to raise awareness and change attitudes and behaviors, particularly during festivals. However, thus far, all earlier campaigns have shown no reduction in fatalities [9,[11][12][13]. A new insight from our study shows that the number of deaths is not very different between the months, but the number of non-fatal accidents is. Interestingly, our findings contradict Thai people's risk beliefs and public awareness of RAFs. Advertising campaigns emphasize death counts and do not mention accident counts, claiming that the number of deaths on the roads is dramatically higher than usual during festivals. This is deceptive. Accidents related to injuries and property damage are also harmful to the social and economic aspects of society. Disability after RAIs is a massive health problem and the economic consequences have resulted in a loss of gross domestic product in the country by 3%-6% or 273-545 million Baht (US$ 8.3-16.5 million, based on an exchange rate of 33 Baht to US$ 1) [60]. Therefore, the authorities are recommended to implement adequate road safety measures for road users daily, not just for a specific period.
Most studies in the literature concur that the factors affecting RTAs and RAFs are roads and the environment. The present study yielded comparable results. The correlations between traffic accidents and road conditions, environmental factors and population were significant [67]. The female sex, the presence of alcohol, and a high degree of curvature in the road all increase the likelihood of severe traffic accidents [68]. Human behavior and demographics are also known factors impacting road safety [69] and thus may be considered in further studies in Thailand.

Conclusion
In this study, the CMP regression and its zero-inflation version, ZICMP regression, were considered for an underdispersion problem and excessive zero death counts for RAFs in Thailand.
Neither Poisson regression nor ZIP regression is appropriate for modeling this type of count response. Because underdispersion is not accounted for, the classic models produce a lower standard error of the estimated parameters than expected. This study highlights the fact that having many zeros does not necessarily mean that a zero-inflated model is required; it depends on the issues of concern that need to be addressed. Finding associated factors to capture excessive zeros is challenging but can provide insights into what happened. A key finding is that the month of the year has the power to determine the difference between the death counts and the extra zero death counts, referred to as uncommon non-fatal accidents in this study. Using January as a baseline, the ZICMP regression revealed that the largest number of non-fatal accidents were in April when people celebrated the traditional Thai New Year Songkran festival. This figure was not obtained using the Poisson or CMP regression. The CMP regression shows a lower fatality rate during this festival month because the number of deaths did not change much throughout the year, but the number of accidents was considerably higher in April than in other months. Other covariates, including road (class, surface, and section) and environment (weather and light conditions), were significant predictors of RAFs in Thailand, as commonly reported elsewhere.
Underdispersion problems in road accident data are rare; however, they have been reported in relation to RAFs in Thailand. In addition to the CMP model, the geometric model may serve as an alternative to the familiar Poisson model. The advantage of the geometric distribution is that it is a mixture of Poisson and an exponential distribution; thus, it incorporates some form of heterogeneity. This point is left for future research. In addition, human behavior, vehicles, roads, and the environment interact with each other during accidents. A possible enhancement of road safety databases, in which various authorities are involved in supplying such data, is a key to effective road safety measures, to reduce road accidents, injuries, and fatalities, not only in Thailand but worldwide.
Supporting information S1 Table. Codes of the independent variables. Each categorical covariate was divided into different levels. Different levels were coded in different values. (DOCX) S1 File Raw data. Data specific to this study is provided. The number of deaths per accident in 2015 ranged from 0 to 6, with many accidents reporting 0 or 1 death per crash.